Blog - Analytixus - The Analytics-DRUID

Analytixus-admin
Jul, Mon, 2026

DnAML , supervised / unsupervised Metadata

When AI Doesn’t Just Write, but Gets Validated: The Analytixus MCP Server in Practice

Most AI demos I see follow the same pattern: a model writes text or code, a human reads it, and whether it actually holds up only shows later, at compile time, at deployment, in the worst case in production.

What has kept me thinking about the Model Context Protocol (MCP) for months is the chance to close that gap. Not by making the AI more cautious, but by having it work directly against a system that contradicts it the moment it gets something wrong.

I tested this with the Analytixus MCP server, on a task I would otherwise have done by hand: filling in missing references in a Lean Data Vault model. The result surprised me. Not because it was error-free, but because the errors showed up in the right place.

The Starting Point: A Model That’s Missing Something

The “Test-DBX” solution contains a Lean Data Vault model based on Microsoft’s Wide World Importers demo database: a bronze layer with the raw data and a silver layer with 31 Hist1 tables (historized facts) and 31 matching key tables. What was missing were the cross-references between these Hist1 tables: the foreign key relationships that have long existed in the source system but hadn’t yet been carried over into the DnAML model.

Exactly the kind of grunt work no one volunteers for: go through 31 tables, identify foreign keys, name them correctly, get them into the right format.

Step 1: Claude Connects and Immediately Knows What It Can Do

First the foundation: Claude is connected to the repository through the Analytixus MCP server and knows its tools before a single domain question is even asked: repository management, solutions, DnAML reference. This MCP server is the headline feature of Analytixus 0.7 (plus a few bug fixes), and it’s the prerequisite for everything that follows.

That’s the basic precondition for everything else: Claude doesn’t work “from the gut,” but against a clearly defined interface.

Step 2: “What Is DnAML?” Answered From the Source, Not From Training

DnAML, short for Data and Analytics Markup Language, is a homegrown, very young DSL. There’s no chance a language model has seen anything about it in its training data. And yet Claude describes its structure and syntax correctly, with all the building blocks (TABLE/VIEW, DOCUMENTATION, LOAD, ORIGINS, REFERENCES, CONSUMERS, GLOSSARY, SOURCES/FOLDER) and a working example.

The difference from a generic language-model answer: this explanation doesn’t come from training, but from a tool call against the DnAML reference that the MCP server provides. Claude fetched the knowledge live, it didn’t guess.

Step 3: Establishing Context

A look at the available solutions shows how casually this works: a quick question, a table of eight projects, done.

Step 4: The Actual Task: Deriving 69 References From Foreign Keys

Now it gets interesting. I describe the context: solution “Test-DBX,” source “WWI” already connected, bronze and silver layers already built following the Lean Data Vault pattern. The task: read all metadata recursively, analyze the foreign key relationships between the 31 Hist1 tables, and derive DnAML references from them.

Claude first reads in all the metadata, then analyzes the foreign key relationships and identifies 69 references across all Hist1 tables, including a clean summary per table.

Two details show that this isn’t simple pattern matching on column names:

– Nullable foreign keys are correctly marked as `INFERREDMEMBER`. So Claude distinguishes between mandatory and optional relationships.

– 13 Hist1 tables are explicitly identified as having no cross-references, because they are pure master-data / lookup tables (Countries, DeliveryMethods, PaymentMethods, etc.). That’s technically correct, and Claude says so, instead of inventing references where there are none.

At the end, the decisive question comes from Claude itself: *”Can you confirm the references as they are, or are there corrections?”* No automatic write-through, but a checkpoint before anything gets changed.

Step 5: The Correction, and How It Lands Immediately

And here comes the part that makes the difference. Claude had given every single reference a name. But that’s only technically correct when it’s a role-playing reference, that is, when several foreign keys of the same table point to the same target and therefore need to be distinguishable (e.g. `Delivery`/`Postal` for two references to `Cities_Key`, or `PrimaryContact`/`AlternateContact` for two references to `People_Key`). If only a single reference points to a target, it doesn’t need a name.

The response to that is the real proof: *”The concept is clear. […] I have no questions and will now create the V2.”* No follow-up question, no uncertainty: the rule is understood immediately, generalized correctly, and reapplied across all 69 references. Exactly the kind of correction you normally have to repeat several times before it sticks.

Step 6: The Closed Loop: Writing With Immediate Validation

The last and most important step: instead of just producing a file, Claude writes the `REFERENCES` blocks directly into the respective Hist1 tables in the repository through the MCP server, including the exact formatting rules (position after the `ORIGINS` block, comma placement).

The decisive point: this validation doesn’t come from Claude. It comes from Analytixus itself. On save, the engine checks the DnAML syntax. If something were wrong, the write would fail or return an error. So Claude gets a real, incorruptible feedback signal, not a “looks plausible.”

Only after this successful two-table pilot, with a green check on both, comes the go-ahead for the rest: *”Well done, the result is exactly as expected. Now write all remaining references into the respective Hist1 table.”*

Claude processes the remaining tables in parallel batches, and every single one is validated by Analytixus on write. In the end: all tables with a green check, not a single syntax error.

App Code Is Craftsmanship, Data & Analytics Is Series Production

At this point it’s worth asking why I have the AI work against a metadata tool like Analytixus at all, instead of simply letting it write the SQL directly. It could just as easily formulate the CREATE TABLE statements and ingest pipelines itself. But that’s exactly what I don’t want, and the reason has to do with the nature of the work.

For app coding, an LLM is in its element: every function is one of a kind, every requirement its own case, and there’s rarely one correct pattern that everything has to follow. Here, room to maneuver is even desirable, and the AI delivers superbly.

Data & Analytics is the opposite. Here I want a CREATE TABLE statement to look exactly the same for every single table, whether it’s about customers, invoices, or stock levels: the same technical columns, the same quoting, the same order, the same rules. This isn’t craftsmanship, it’s series production, where every part comes off the line following the same blueprint.

And this is exactly where a freely formulating LLM becomes a risk. When it writes a thousand CREATE TABLE statements, at some point it decides that a technical column is actually redundant for this one table. Or it stops quoting a name because it deemed it unnecessary in that spot. No ill intent, just the nature of the model: it doesn’t strive for determinism, it strives for plausible. But for series production, plausible isn’t good enough.

That’s why the division of labor this whole process demonstrates: the LLM does what it’s genuinely good at, namely understanding and analyzing which references exist in the domain. The deterministic generation of the code is handled by the engine, from metadata, following fixed rules, identical for every table. DnAML is the boundary between the two worlds: the AI thinks in metadata, not in DDL. It never gets its hands on the part that’s always the same in the first place.

Why This Convinces Me More Than Most Agent Demos

There were three moments in this process that together make the difference:

1. Knowledge from the source, not from training. The DnAML explanation came from a live tool call, not from half-learned trivia about a DSL that can’t even exist in the training corpus.

2. A correction that generalizes immediately. A single, clearly stated domain rule was understood and applied correctly across 69 references, without follow-up questions, without repetition.

3. Validation by the system, not by trust. Every change was checked by the Analytixus engine itself on write. That’s the real value of MCP in a tool like this: the AI doesn’t just get access, it gets a referee that tells it at every step whether it got it right.

In the end, all that was left for me was the role of reviewer at the right moments: confirm the reference list once, correct a naming convention once, release the pilot once. The actual grunt work (analyzing 31 tables, naming and writing 69 references correctly) took care of itself, without me typing a single line of DnAML myself.

And that’s exactly where the real question lies for me: how many of our AI workflows today really let the AI work against a referee that pushes back instantly, instead of just against our trust? How do you handle this in your projects?

Andy Löwen
Nov, Sat, 2025

Framework , Python

Code Generation in Analytixus with Python and Jinja2: A Practical, Scalable Pattern

Audience: Analytixus users and Data Engineers

This article demonstrates how to generate SQL code from metadata in Analytixus using Python and Jinja2. We use a simple, understandable example (a CREATE TABLE statement) to show the pattern clearly. In real projects, you will modularize and extend this approach for more artifacts (views, DDL migrations, MERGE statements, CDC logic, and entire load pipelines). Treat everything below as an example to illustrate the principles; future modularization is both necessary and recommended.

Jinja-Documentation

Jinja2 at pypi

Why Python + Jinja2 in Analytixus?

Scalability: One metadata object → one code file; hundreds of objects → hundreds of files, automatically.
Consistency: A single source‑of‑truth template enforces conventions across all generated artifacts.
Speed of change: Update the template or type mapping once; regenerate all outputs.
Separation of concerns: Business metadata lives in Analytixus; rendering logic lives in templates and small utility functions.

The Stable Interface Contract in Analytixus

Analytixus calls your generator script in a standard way:

Method signature: generate(md: dict) -> str
Input: A single parameter md that contains the parsed JSON metadata.
Output: A single string that contains the generated code, which Analytixus writes to a file.
Constraint: Do not change the method name generate, the parameter contract, nor the idea of returning a string. Internals are up to you, as long as the method returns the expected string.

Project Structure (DBX-Demo-WWI)

We’ll use the following layout:

DBX-Demo-WWI (project-root)/
  Bronze/
      02_BronzeStoreWithPy.pyx      # contains the generate(md) function
  Jinja_templates/
      bronze_store_table.sql.jinja2
  Silver/
      ...

The generator lives in Bronze/02_BronzeStoreWithPy.pyx.
Templates live in Jinja_templates/ at the project root.
Silver (and other layers) will follow similar generation patterns later.

Example Metadata (from Analytixus)

We’ll generate a Databricks SQL CREATE TABLE for the WideWorldImporters source table Application.Cities using this metadata sample:

{
  "StartIndex": "53",
  "BraceOpenIndex": "188",
  "BraceCloseIndex": "963",
  "Id": "Source.WWI.Application.Cities",
  "SchemaName": "Application",
  "ObjectName": "Cities",
  "TableName": "Application.Cities",
  "ObjectType": "Table",
  "CatalogName": "bsh-sqldb-WideWorldImporters",
  "SourceName": "WWI",
  "SourceType": "SqlDatabase",
  "Column": [
    {
      "StartIndex": "209",
      "EndIndex": "297",
      "Id": "Source.WWI.Application.Cities.CityID",
      "ColumnName": "CityID",
      "Ordinal": "1",
      "DataType": "INT",
      "IsNullable": "0",
      "IsPK": "1",
      "IsBK": "0",
      "IsKey": "1",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "304",
      "EndIndex": "397",
      "Id": "Source.WWI.Application.Cities.CityName",
      "ColumnName": "CityName",
      "Ordinal": "2",
      "DataType": "NVARCHAR",
      "Length": "50",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "ALPHANUMERIC",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "404",
      "EndIndex": "490",
      "Id": "Source.WWI.Application.Cities.StateProvinceID",
      "ColumnName": "StateProvinceID",
      "Ordinal": "3",
      "DataType": "INT",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "497",
      "EndIndex": "580",
      "Id": "Source.WWI.Application.Cities.Location",
      "ColumnName": "Location",
      "Ordinal": "4",
      "DataType": "GEOGRAPHY",
      "IsNullable": "1",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "GEOGRAPHY",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "587",
      "EndIndex": "681",
      "Id": "Source.WWI.Application.Cities.LatestRecordedPopulation",
      "ColumnName": "LatestRecordedPopulation",
      "Ordinal": "5",
      "DataType": "BIGINT",
      "IsNullable": "1",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "688",
      "EndIndex": "771",
      "Id": "Source.WWI.Application.Cities.LastEditedBy",
      "ColumnName": "LastEditedBy",
      "Ordinal": "6",
      "DataType": "INT",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "778",
      "EndIndex": "863",
      "Id": "Source.WWI.Application.Cities.ValidFrom",
      "ColumnName": "ValidFrom",
      "Ordinal": "7",
      "DataType": "DATETIME",
      "Length": "7",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "TIME",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "870",
      "EndIndex": "952",
      "Id": "Source.WWI.Application.Cities.ValidTo",
      "ColumnName": "ValidTo",
      "Ordinal": "8",
      "DataType": "DATETIME",
      "Length": "7",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "TIME",
      "IsTechnical": "0"
    }
  ]
}

Notes:

Many numeric-like fields are strings (e.g., "Ordinal": "1"). Normalize them in code.
Some types have auxiliary fields like Length; compose NVARCHAR(50) before mapping to a Databricks-compatible type.

The Jinja2 Template

We render a Databricks SQL CREATE TABLE for the bronze “Store” layer. Technical columns appear first, then business columns from metadata, followed by a PK constraint.

CREATE TABLE IF NOT EXISTS `{{ schema }}`.`Store_{{ table_name }}`
(
    -- Technical Fields
{%- for col in technical_columns %}
    {% if loop.first %}{% else %},{% endif %}`{{ col.name }}` {{ col.data_type }}{% if not col.nullable %} NOT NULL{% endif %}{% if col.comment %} /*{{ col.comment }}*/{% endif %}
{%- endfor %}
    -- Custom Fields
{%- for col in business_columns %}
    , `{{ col.name }}` {{ col.data_type }}{% if not col.nullable %} NOT NULL{% endif %}{% if col.is_business_key %} /*IsBK*/{% elif col.comment %} /*{{ col.comment }}*/{% endif %}
{%- endfor %}
    -- PK-CONSTRAINTs
    ,CONSTRAINT `PK_{{ catalog }}_{{ schema }}_Store_{{ table_name }}` PRIMARY KEY ({% for pk in primary_key_columns %}{% if not loop.first %}, {% endif %}`{{ pk }}`{% endfor %})
);

Template tips:

Use loop.first to avoid trailing commas.
Keep comments to aid auditing (/*IsBK*/ etc.).
Adopt consistent naming conventions (e.g., Store_{{ table_name }}).

The `generate(md)` Function in Bronze/02_BronzeStoreWithPy.pyx

Below is a compact, end-to-end example that:

Normalizes incoming column metadata,
Maps SQL Server/generic types to Databricks-compatible types,
Loads the Jinja2 template from ../Jinja_templates,
Returns the SQL string.

This is an illustrative example. In real projects, modularize and extract reusable helpers.

from jinja2 import Environment, FileSystemLoader

def generate(md: dict) -> str:
    """Receives metadata, generates code and returns it as string."""
    # Prepare technical columns
    technical_columns = [
        {"name": "DWH_DEF_DATE", "data_type": "TIMESTAMP", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_LOAD_ID", "data_type": "INT", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_RECORD_ID", "data_type": "BIGINT GENERATED ALWAYS AS IDENTITY", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_SOURCE_ID", "data_type": "INT", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_SOURCETABLE", "data_type": "STRING", "nullable": True, "comment": "IsTechnical"}
    ]

    # Type mapping
    DATABRICKS_TYPE_MAPPING = { ... }  # as above
    def normalize_type_for_databricks(data_type: str) -> str: ...  # as above

    def _to_column(input_col: dict) -> dict:
        """Convert a column dictionary to a column definition dictionary."""
        return {
            "name": input_col["ColumnName"],
            "data_type": normalize_type_for_databricks(input_col["DataType"]),
            "nullable": bool(input_col["IsNullable"]),
            "is_business_key": bool(input_col["IsBK"]),
            "Ordinal": int(input_col["Ordinal"]),
        }

    business_columns = sorted([_to_column(in_col) for in_col in md["Column"]], key=lambda x: x["Ordinal"])

    # NOW GENERATE!
    env = Environment(loader=FileSystemLoader(r"C:\Temp\ProjectData\DBX-Demo-WWI\jinja_templates"))
    template = env.get_template("bronze_store_table.sql.jinja2")

    return template.render(
        catalog=md["CatalogName"],
        schema=md["SchemaName"],
        table_name=md["ObjectName"],
        technical_columns=technical_columns,
        business_columns=business_columns,
        primary_key_columns=["DWH_RECORD_ID"]
    )

Key points:

Uses relative path resolution to keep the build portable across environments.
Composes raw types (e.g., NVARCHAR(50)) before mapping to Databricks-compatible types.
Defaults complex types like GEOGRAPHY and XML to STRING for bronze ingestion; adjust for curated layers.

Example Output (from the provided metadata)

With the metadata above, the template produces the following Databricks SQL:

CREATE TABLE IF NOT EXISTS `Application`.`Store_Cities`
(
	-- Technical Fields
	`DWH_DEF_DATE` TIMESTAMP NOT NULL /*IsTechnical*/
	,`DWH_LOAD_ID` INT NOT NULL /*IsTechnical*/
	,`DWH_RECORD_ID` BIGINT GENERATED ALWAYS AS IDENTITY NOT NULL /*IsTechnical*/
	,`DWH_SOURCE_ID` INT NOT NULL /*IsTechnical*/
	,`DWH_SOURCETABLE` STRING /*IsTechnical*/
	-- Custom Fields
	, `CityID` None /*IsBK*/
	, `CityName` None /*IsBK*/
	, `StateProvinceID` None /*IsBK*/
	, `Location` None /*IsBK*/
	, `LatestRecordedPopulation` None /*IsBK*/
	, `LastEditedBy` None /*IsBK*/
	, `ValidFrom` None /*IsBK*/
	, `ValidTo` None /*IsBK*/
	-- PK-CONSTRAINTs
	,CONSTRAINT `PK_bsh-sqldb-WideWorldImporters_Application_Store_Cities` PRIMARY KEY (`DWH_RECORD_ID`)
);

Operational Flow in Analytixus

The build process selects metadata objects and calls generate(md) for each.
Your function returns a string; Analytixus writes each result to its destination file (e.g., based on object name and schema).
Determinism matters for reviews and CI: sort columns by Ordinal, not by name.

Testing and Validation

Add unit tests with representative metadata samples to verify:
- Type normalization (e.g., NVARCHAR(50) → STRING).
- Precision/scale preservation for DECIMAL(38,18) (when present).
- Defaults for complex types (GEOGRAPHY, XML) — choose strict vs. lenient behavior per layer.
Validate required fields early (e.g., SchemaName, ObjectName, Column presence). Fail fast on missing essentials to avoid silent mis-generation.

Extending Beyond CREATE TABLE

This pattern generalizes easily:

Generate views (silver/gold), MERGE statements, CDC logic, orchestration/job configs, and more.
Emit multiple files per object (e.g., DDL + DML + pipeline configuration).
The CREATE TABLE example is intentionally simple to demonstrate the approach. In real projects, modularize helpers, factor configurations, and introduce environment-aware conventions.

Best Practices and Pitfalls

Prefer relative template paths to avoid hard-coded strings and ensure portability.
Keep comments and naming conventions consistent and documented (e.g., Store_ prefix).
Separate technical vs. business columns clearly; later, make technical columns configurable when modularizing.
Make unknown types a conscious decision: warn or fail. Bronze layers often accept strings; curated layers should be stricter.
Preserve precision/scale when mapping numeric types.
Ensure deterministic output for clean diffs and reliable CI/CD.

Conclusion

Python + Jinja2 provides a clean, scalable way to transform Analytixus metadata into consistent, reviewable code. With a stable generate(md) -> str contract, robust type normalization, and deterministic templates, you can evolve from simple DDL generation to complete, metadata-driven pipelines across bronze, silver, and gold layers. This article’s code is a minimal example to illustrate the pattern; in production, modularize utilities, configurations, and tests to keep the system maintainable and flexible.

Andy Löwen
Oct, Sun, 2025

DnAML

DnAML (Data and Analytics Markup Language): A Practical Guide for Analytixus

Data and analytics teams juggle multiple concerns: raw source onboarding, semantic enrichment, lineage tracking, dependency management, and consistent code generation across pipelines and layers. DnAML (Data and Analytics Markup Language) is a domain-specific, JSON-like metadata format designed to unify these concerns in a single, readable source of truth. This post explains what DnAML is, what you can do with it, why you might need it, and how to use it—plus small, copy-ready examples you can adapt for your Analytixus projects.

What is DnAML?

DnAML is a compact, human-readable language for semantically describing and orchestrating data sources, structures, and transformations in analytics solutions. It lets you centrally define and control data models along the processing chain—from raw ingestion through semantic modeling to analytical delivery. Conceptually, unsupervised metadata extracted from source systems is transformed into a curated DnAML model, forming the second stage in the metadata life cycle.

DnAML borrows a familiar, block-oriented structure reminiscent of JSON but adds its own semantics for:

Declaring tables and views
Modeling column types, nullability, defaults, keys, and free-form options
Defining lineage via ORIGINS with UNION/JOIN behavior
Expressing cross-object relationships via REFERENCES
Organizing models with SOURCES and FOLDER constructs

These core constructs are part of the language grammar and enable consistent, machine- and human-friendly modeling of data assets.

What can you do with DnAML?

Define data objects (tables, views) and their attributes with clear types, constraints, and options.
Document lineage with ORIGINS, composing objects from upstream OBJECTs using UNION or JOIN and explicit field mappings.
Specify relationships and matching rules through REFERENCES, including comparisons, ranges, constants, and optional expressions to annotate logic.
Organize models by SOURCE (systems) and FOLDER (domains or layers like Bronze, Silver, Gold) to reflect your architecture.
Capture operational metadata via free-form options in square brackets—for example, schema names, owners, synchronization hints, and more.

Why DnAML?

Teams need a consistent way to represent:

What an object contains (columns, types, constraints),
Where it comes from (lineage and mapping),
How it relates to other objects (references),
How the model is organized (sources, folders, layers).

By keeping all of this in one language, you gain:

Clarity and maintainability across pipeline stages
Versionability of models as code
A foundation for automation (generating DDL, ETL/ELT jobs, notebooks, or validation rules)
End-to-end traceability of attribute origins across processing steps

The DnAML grammar natively supports all of these aspects, aligning modeling, lineage, and relationships in a single definition.

How DnAML is used in practice

DnAML uses statements and blocks with consistent syntax:

Top-level statements include TABLE, VIEW, REFERENCES, SOURCES, and FOLDER.
Blocks use braces { ... }, entries are comma-separated, and semicolons are optional.
Identifiers can be dot-qualified (e.g., Application.Cities) or quoted when they include spaces or special characters (e.g., "Order Item").
Free-form options appear in square brackets [ ... ] as either bare terms or key: value pairs, where values can be identifiers, strings, or numbers.
Comments use /* ... */ and are ignored by parsers.

Workflow: Integrating XML and DnAML

In a typical metadata management workflow, XML and DnAML complement each other:

Unsupervised metadata: Automatically extracted metadata from source systems or code is initially stored in XML. This stage captures raw, uncurated information.
DnAML modeling: Unsupervised metadata is converted into a DnAML model, where it is structured, semantically enriched, and maintained. Here, you also define mappings between source and target structures.
Supervised metadata: Once curated and validated in DnAML, the model is persisted back to XML—but now as supervised metadata. This XML becomes the basis for automated processing, such as generating data structures, pipelines, or validation artifacts.

Examples

Below are small examples you can copy into your projects. They illustrate sources, tables, views, lineage, and options. Each example is presented in two code block variants for easy copying.

1) Declaring a source and a table with typed columns and options

Sources {
    Source WWI  {
        Table Application.Cities [SchemaName:Application,SourceName:wwierp] {
            Columns {
                CityID INT NOT NULL PRIMARY KEY [BasisDataType:INTEGER,Ordinal:1,IsKey:1,IsTechnical:0],
                CityName NVARCHAR(50) NOT NULL [BasisDataType:ALPHANUMERIC,Ordinal:2,IsKey:0,IsTechnical:0],
                ...
            }
        }
    }
}

Uses SOURCES and Source to group objects; Table includes a Columns block with types, nullability, primary key, and options.

2) Bronze-layer view with lineage (Origins) and semantic options

Folder Bronze { 
    View Bronze_Access.Application_Cities [SchemaName:'Bronze_Access'] {
        Columns {
            DWH_SOURCE_ID INT NOT NULL [BasisDataType:INTEGER, IsTechnical:1], 
            CityID INT NULL UNIQUE [BasisDataType:INTEGER, IsKey:1, OriginColumnName:'CityID', OriginId:'Bronze.Bronze_Store.Application_Cities.CityID'], 
            ... 
        }
        Origins { 
            Object "Bronze.Bronze_Store.Application_Cities" AS Stage [AliasName:Stage, SchemaName:'Bronze_Store'] 
        } 
    } 
}

Uses FOLDER to organize layer-specific models; View includes Columns and an Origins block with an OBJECT, aliased via AS, and free-form options.

3) Columns, defaults, identity, and constraints

Table Sales.Order [SchemaName:dbo] {
  Columns {
    OrderId INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    OrderDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CustomerId INT NOT NULL,
    Comment NVARCHAR(200),
    Total DECIMAL(18,2) NOT NULL
  }
}

Shows supported types, IDENTITY with seed and increment, NULL/NOT NULL, defaults, and constraints (PRIMARY KEY, UNIQUE).

4) Lineage with UNION/JOIN and field mappings

View rpt.ActiveOrders [SchemaName:dbo] {
  Columns { OrderId INT, CustomerId INT, Status VARCHAR(20) },
  Origins JOIN [on:OrderId] {
    Object erp.Orders AS o [system:'ERP'] {
      o.OrderId = rpt.ActiveOrders.OrderId
    },
    Object crm.OrderExtras AS e {
      e.OrderId = rpt.ActiveOrders.OrderId
    }
  }
}

ORIGINS declares composition via JOIN; OBJECT entries can have aliases and free-form options; mappings use SourceId = TargetId.

5) References (relationships) with rules

References [strict:true] {
  Reference FK_Order_Customer FOREIGNOBJECT:Sales.Order PRIMARYOBJECT:dbo.Customer [cascade:false] {
    CustomerId = Id,
    OrderDate BETWEEN MinDate AND MaxDate,
    Status = 'Active' [weight:10],
    'VIP' = Tag,
    CustomerId = Id EXPRESSION: 'Customer.Id equals Order.CustomerId'
  }
}

REFERENCES groups REFERENCE entries; supports comparisons between identifiers and constants, BETWEEN ranges, and EXPRESSION annotations to document logic.

Getting started

Model sources and organization:
Use SOURCES to declare upstream systems and FOLDERs to group objects by domain or layer (e.g., Bronze, Silver, Gold).
Define objects:
Create TABLE or VIEW blocks with COLUMNS, including types, nullability, identity, defaults, and constraints.
Add lineage:
Use ORIGINS with UNION/JOIN; add OBJECT entries with optional aliases and field mappings.
Describe relationships:
Use REFERENCES with REFERENCE rules for FK-to-PK matching, constants, ranges, and annotations.
Use options for metadata:
Add square-bracketed [ ... ] options anywhere to capture schema, ownership, precision, hints, etc.

Tips and best practices

Quote identifiers that contain spaces or special characters (e.g., "Order Item") to avoid parsing ambiguity.
Keep operational and semantic metadata in [ ... ] options; this separates core schema from auxiliary attributes without cluttering structure.
Be explicit about lineage behavior:
Choose UNION vs. JOIN in ORIGINS to document composition semantics clearly.
Use REFERENCES consistently:
FOREIGNOBJECT represents the foreign-key side; PRIMARYOBJECT the primary-key side.

Advanced: Expressions and operator precedence

DnAML supports expressions for defaults and rules. You can use identifiers, strings, numbers, or tuples; unary operators include NOT, plus/minus, bitwise NOT; binary operators include arithmetic, bitwise, comparison, AND/OR, LIKE/IN, and their NOT variants. Operator precedence ensures predictable evaluation of complex expressions.

DEFAULT (Amount + Tax) / 2
DEFAULT NOT (Status = 'Inactive')

Parentheses control evaluation order; compound tokens like NOT LIKE and NOT IN are treated as single operators in parsing.

Where DnAML fits in Analytixus

Bronze/Silver/Gold modeling:
Use FOLDERs to organize layers and Views/Tables to declare layer-specific schemas, lineage, and rules.
Automation:
Curated DnAML models can drive generation of DDL scripts, pipeline tasks, notebooks, or validation configurations.
Governance:
Semantic enrichment in DnAML plus supervised XML persistence enables transparent lineage and audit-ready documentation.

Summary

DnAML provides a unified, domain-specific language to describe your analytics models: sources, objects, attributes, lineage, and relationships. It is readable enough for collaborative design and precise enough for reliable automation. If you’re building or maintaining Data Lakehouse or analytic platforms, DnAML can serve as the backbone of your metadata-driven workflows—bridging raw extraction and curated, supervised models that power repeatable, scalable data engineering.

Andy Löwen
Sep, Thu, 2025

Framework

Lean Data Vault: A Pragmatic Approach to Data Modeling in the Lakehouse

Every data and analytics project (data lakehouse) needs a framework with best practices. This one is definitely worth reading if you’re looking for a data modeling approach that doesn’t overcomplicate things, focuses on data quality, and supports massive parallel processing.

Ping back: https://andyloewen.de/2025/09/17/lean-data-vault-a-pragmatic-approach-to-data-modeling-in-the-lakehouse/

Andy Löwen
Sep, Thu, 2025

DnAML , supervised / unsupervised Metadata , XSLT

Enriching Metadata from Code Comments in Analytixus

In modern Data & Analytics projects, metadata is the backbone of automation and governance. Most of the time, we can derive structural metadata directly from the code — for example, by parsing SQL CREATE TABLE statements into XML-based models.

But sometimes, SQL alone isn’t enough. There are attributes we want to capture — like IsBusinessKey, IsTechnical, or History Flags — that cannot be expressed syntactically in the DDL.

This is where code comments come into play. In Analytixus, we use comments as metadata carriers. During the build process, they are extracted, transformed, and converted into actionable metadata.

From Code to Metadata

Here’s an example generated with an XSLT template:

CREATE OR REPLACE TABLE `WWI`.`Store_Countries`
(
    -- Technical Fields
    `DWH_DEF_DATE` TIMESTAMP NOT NULL /*IsTechnical*/
    ,`DWH_LOAD_ID` INT NOT NULL /*IsTechnical*/
    ,`DWH_RECORD_ID` BIGINT GENERATED ALWAYS AS IDENTITY NOT NULL /*IsTechnical*/
    ,`DWH_SOURCE_ID` INT NOT NULL /*IsTechnical*/
    ,`DWH_SOURCETABLE` STRING /*IsTechnical*/

    -- Custom Fields
    , `CountryID` INT NOT NULL
    , `CountryName` STRING NOT NULL
    , `FormalName` STRING NOT NULL
);

Notice the inline comment /*IsTechnical*/.
In the resulting XML metadata, this becomes:

<Column ColumnName="DWH_RECORD_ID"
        DataType="BIGINT"
        IsNullable="0"
        IsTechnical="1"
        IsPK="1"
        IsKey="1" />

Why Comments Matter

Some attributes simply cannot exist in pure SQL syntax:

Business Keys (IsBK)
Technical Fields (IsTechnical)
History Flags (IsHist)
Origin References (linking to Bronze/Silver layers)

Embedding these semantics as inline comments makes them machine-readable.

Example:

CREATE TABLE [Silver_Store].[Application_Cities_Key] 
/*IsHist:0; Name:Application_Cities; OriginId:Bronze.Bronze_Access.Application_Cities; HistObjectId:Silver.Silver_Store.Application_Cities_Hist1*/ 
(
    [DWH_Application_CitiesId] INT IDENTITY(2, 1) NOT NULL /*IsTechnical; IsPK*/
    ,[CityID] INT /*IsPK:0; IsBK:1*/
)

From Unsupervised to Supervised Metadata

During the build process, Analytixus first generates unsupervised metadata:

<Object Id="Silver.Silver_Store.Application_Cities_Key"
        ObjectType="Table"
        IsHist="0"
        Name="Application_Cities"
        OriginId="Bronze.Bronze_Access.Application_Cities"
        HistObjectId="Silver.Silver_Store.Application_Cities_Hist1">
  <Column ColumnName="CityID" DataType="INT" IsBK="1" />
   ...
    <Origins>
      <Object Id="Bronze.Bronze_Access.Application_Cities" SchemaName="Bronze_Access" ObjectName="Application_Cities" ViewName="Bronze_Access.Application_Cities" ObjectType="View">
        <Column Id="Bronze.Bronze_Access.Application_Cities.DWH_SOURCE_ID" ColumnName="DWH_SOURCE_ID" Ordinal="1" DataType="INT" IsNullable="0" IsPK="0" IsBK="0" IsKey="0" BasisDataType="INTEGER" OriginAliasName="Stage" OriginColumnName="DWH_SOURCE_ID" OriginId="Bronze.Bronze_Store.Application_Cities.DWH_SOURCE_ID" IsTechnical="1" />
   ...
      </Object>
   </Origins>
    <Hist>
      <Object Id="Silver.Silver_Store.Application_Cities_Hist1" SchemaName="Silver_Store" ObjectName="Application_Cities_Hist1" TableName="Silver_Store.Application_Cities_Hist1" ObjectType="Table" IsHist="1" Name="Application_Cities" OriginId="Bronze.Bronze_Access.Application_Cities" KeyObjectId="Silver.Silver_Store.Application_Cities_Key">
        <Column Id="Silver.Silver_Store.Application_Cities_Hist1.DWH_Application_CitiesId" ColumnName="DWH_Application_CitiesId" Ordinal="1" DataType="INT" IsNullable="1" IsPK="0" IsBK="0" IsKey="0" BasisDataType="INTEGER" IsTechnical="1" />
        ...
      </Object>
    </Hist>
</Object>

Then, attributes like HistObjectId and OriginId are resolved, and the metadata is enriched into supervised metadata:

History objects are linked.
Origins are connected to their Bronze counterparts.
Relationships and constraints are automatically added.

This metadata can now be consumed by further XSLT templates and DNA-ML models.

Benefits of Using Comments as Metadata

✅ Flexibility: Developers influence metadata directly in the code.
✅ Consistency: Attributes are extracted systematically.
✅ Traceability: Origins and history objects provide lineage across layers.
✅ Automation: Downstream templates can act on enriched metadata.

Conclusion

By using code comments as metadata carriers, we bridge the gap between SQL definitions and rich semantic models.

In Analytixus, comments are not just documentation — they are drivers of automation. They ensure that attributes like IsBK, IsTechnical, and IsHist become first-class citizens in the metadata layer.

This approach transforms unsupervised metadata into supervised metadata, enabling automation, lineage, and governance without slowing down development.

When AI Doesn’t Just Write, but Gets Validated: The Analytixus MCP Server in Practice

The Starting Point: A Model That’s Missing Something

Step 1: Claude Connects and Immediately Knows What It Can Do

Step 2: “What Is DnAML?” Answered From the Source, Not From Training

Step 3: Establishing Context

Step 4: The Actual Task: Deriving 69 References From Foreign Keys

Step 5: The Correction, and How It Lands Immediately

Step 6: The Closed Loop: Writing With Immediate Validation

App Code Is Craftsmanship, Data & Analytics Is Series Production

Why This Convinces Me More Than Most Agent Demos

Code Generation in Analytixus with Python and Jinja2: A Practical, Scalable Pattern

Why Python + Jinja2 in Analytixus?

The Stable Interface Contract in Analytixus

Project Structure (DBX-Demo-WWI)

Example Metadata (from Analytixus)

The Jinja2 Template

The generate(md) Function in Bronze/02_BronzeStoreWithPy.pyx

Example Output (from the provided metadata)

Operational Flow in Analytixus

Testing and Validation

Extending Beyond CREATE TABLE

Best Practices and Pitfalls

Conclusion

DnAML (Data and Analytics Markup Language): A Practical Guide for Analytixus

What is DnAML?

What can you do with DnAML?

Why DnAML?

How DnAML is used in practice

Workflow: Integrating XML and DnAML

Examples

1) Declaring a source and a table with typed columns and options

2) Bronze-layer view with lineage (Origins) and semantic options

3) Columns, defaults, identity, and constraints

4) Lineage with UNION/JOIN and field mappings

5) References (relationships) with rules

Getting started

Tips and best practices

Advanced: Expressions and operator precedence

Where DnAML fits in Analytixus

Summary

Lean Data Vault: A Pragmatic Approach to Data Modeling in the Lakehouse

Enriching Metadata from Code Comments in Analytixus

From Code to Metadata

Why Comments Matter

From Unsupervised to Supervised Metadata

Benefits of Using Comments as Metadata

Conclusion

The `generate(md)` Function in Bronze/02_BronzeStoreWithPy.pyx